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The Large Area Telescope (LAT) event analysis is the final stage in the event reconstruction responsible for 
the creation of high-level variables (e.g., event energy, incident direction, particle type, etc.). We discuss the 
development of TMine, a powerful new tool for designing and implementing event classification analyses (e.g., 
distinguishing photons from charged particles). TMine is structured on ROOT, a data analysis framework that 
is the de-facto standard for current high energy physics experiments; thus, TMine fits naturally into the ROOT- 
based data processing pipeline of the LAT. TMine provides a visual development environment for the LAT 
event analysis and utilizes advanced multivariate classification algorithms implemented in ROOT. We discuss 
the application of TMine to the next iteration of the event analysis (Pass 8), the LAT charged-particle analyses, 
and the classification of unassociated LAT 7-ray sources. 



1. Fermi-LAT Event Analysis 

The Large Area Telescope (LAT) operates in a low 
Earth orbit, where every second thousands of parti- 
cles trigger the detector. After on-board filtering, the 
recorded data from these triggers are transmitted to 
the ground and undergo full event reconstruction. The 
final stage of LAT reconstruction is the event analy- 
sis, which combines information from each detector 
subsystem (the anticoincidence detector, tracker, and 
calorimeter) to create a picture of the event as a whole. 
From the event picture, high-level science variables 
(i.e., event energy and incident direction) are assigned. 
The event analysis must also address the challenging 
task of separating the desired 7-ray signal events from 
charged particle backgrounds [1]. 

The assignment of fundamental quantities such as 
particle type, energy, and direction is a complex prob- 
lem, since the LAT accepts particles over a wide range 
in parameter space (both in energy and incident angle) 
and event topology (close to detector edges and gaps). 
In addition, discrimination against background at a 
level of 1 part in 10^ is required to fulfill the LAT sci- 
ence goals. Classic cut-based analyses lack sufficient 
accuracy and signal efficiency to meet these goals. 
To achieve the required instrument performance, the 
LAT event analysis applies classic cuts followed by 
multivariate classiffcation trees [2]. 

Classiffcation trees (and decision trees in general) 
belong to the larger family of data mining and ma- 
chine learning algorithms [3]. Classification in the 
context of machine learning focuses on associating an 
observation to a sub-population based on the traits 
present in a set of training observations (where the 
true sup-population is known). Training of classifica- 
tion trees is performed through binary recursive parti- 
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tioning, an algorithm that develops a set of logical cuts 
by iteratively splitting the training data to maximize 
the separation of the true sub-populations. For the 
LAT event analysis, the training of classification trees 
is performed on sets of 7-ray and cosmic-ray events 
generated from a full detector Monte Carlo simulation. 
These logical cuts are trained on variables describing 
the physical character of an event shower (e.g., the 
transverse shower size in the calorimeter, the num- 
ber of excess tracker clusters surrounding the primary 
particle track, etc.), while the output is a classifica- 
tion of the event (e.g., the type of particle, the quality 
of direction reconstruction, etc.). 

We introduce TMine, a new tool for implementing 
both cut-based and multivariate classification algo- 
rithms. The goal of TMine is to enhance the perfor- 
mance of the event level analysis to improve the LAT 
instrument response functions (i.e., effective area, en- 
ergy resolution, and point spread function). Addition- 
ally, TMine has been used for studying LAT charged 
particle events (electrons, positrons, and protons) and 
the classifying unassociated LAT 7-ray sources. 



2. The TMine Analysis Tool 

TMine is an interactive software tool for develop- 
ing and processing complex event classiffcation anal- 
yses. TMine is based on ROOT [3], the de-facto data 
analysis framework for current high energy physics ex- 
periments. In particular, TMine uses the data set in- 
dexing and linking functionality of ROOT to associate 
newly calculated variables with pre-existing quantities 
and keeps only the minimal information necessary to 
process the analysis. Thus, TMine handles large data 
sets in a quick and efficient manner, especially when 
some variables are only deffned for a small subset of 
the events. 

TMine applies classic event-selection cuts in the 
standard ROOT manner through TFormulas, TCuts, 
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Figure 1: The current iteration of the Fermi-LAT event analysis as viewed by TMine. Many nodes contain sub-analyses 
(inset top left), while the functionality of each node can be plotted (inset top middle) and edited through a GUI editor 
(inset top right). A TMine analysis can combine classical cut-based selections with multivariate classification. 



and event indexing. For the processing and paral- 
lel evaluation of sophisticated multivariate classifica- 
tion algorithms, TMine utilizes the ROOT Toolkit for 
Multivariate Analysis (TMVA) [5]. Through TMVA, 
TMine has access to many multivariate classification 
algorithms including, but not limited to, boosted de- 
cision trees and artificial neural networks. While the 
command- line functionality of ROOT is preserved, the 
graphical user interface of TMine allows the user to 
harness the power of ROOT and TMVA in a visual 
work environment. TMine was specifically designed to 
address problems faced in high energy physics, though 
it need not be restricted to these. 

A TMine analysis consists of a network of direc- 
tionally linked nodes controlling work flow and oper- 
ation (Figure [1]). Nodes both alter event character- 
istics (i.e., variable definition, assignment, and selec- 
tion) and direct events through the network. Special- 



ized nodes are used for training, testing, and imple- 
menting TMVA classification algorithms. Using the 
machinery of ROOT, TMine is able to split, manipu- 
late, and recombine large quantities of data without 
excessive duplication of information. Structuring the 
event analysis in a visual manner has been found to be 
conceptually powerful when designing the LAT event 
analysis 



3. Applications of TMine 

3.1. Tlie Pass-8 Reconstruction Effort 

Our primary application of TMine is in the devel- 
opment and implementation of the Pass-8 event anal- 
ysis. The Pass-8 effort is a complete reworking of the 
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Figure 2: Comparisons between a statistical sample of 
photons from flight data (blue) and simulations (red). 
Only variables with good agreement should be used for 
classification. 



LAT simulation and reconstruction software, benefit- 
ing from the analysis of flight data (which was unavail- 
able before launch) . TMine will improve the interface 
between event reconstruction and event classification. 
It also provides improvements to the structure and 
validation of the Pass-8 event analysis. TMine has 
built-in functionality for comparing real and simulated 
data (FigureO, an essential step prior to training mul- 
tivariate classification algorithms [6, 7]. Additionally, 
the TMine interface to TMVA allows for the training 
of multivariate classification algorithms using larger 
data sets than was possible with the software tools 
previously used by the Fermi-LAT Collaboration. 

3.2. LAT Charged-Particle Analyses 

In addition to the Pass-8 effort, TMine has been uti- 
lized in a variety of ongoing LAT analyses. Since elec- 
tromagnetic showers are common to photon, electron, 
and positron events, the LAT is naturally sensitive to 
cosmic-ray electrons and positrons 0, 0] . For the ma- 
jority of LAT analyses, these charged particles present 
a background for 7-ray science. Thus, the detection of 
electrons and positrons requires a non-standard event 
analysis and a reprocessing of the LAT data (the anal- 
ysis of electrons and positrons has subsequently been 
appended to the standard event analysis). TMine was 
used for this reprocessing because it is a stand-alone 
program that is free from the overhead of the full LAT 
reconstruction software. 

A similar effort is underway to study cosmic-ray 
proton events in more detail [8[- For this task, TMine 
was used both to design a proton event classification 
and to reprocess LAT data. The analysis of proton 
events presents an excellent example of how TMine 
can be used for event classification. Figure [3] shows a 
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Figure 3: A simple TMine worksheet for discriminating 
cosmic-ray hadron events from cosmic-ray lepton events. 
Data is input on the left, has a classical charged particle 
cut applied, and is used to train a TMVA classifier. 



simple event analysis for distinguishing hadrons from 
leptons. This worksheet is read from left to right, 
with the training data set input on the left and the 
predicted particle type output on the right. A classic 
cut selecting charged particles is applied first, followed 
by a split and tagging of the true particle type. Events 
are then recombined and used to train TMVA boosted 
decision trees. The preliminary performance of this 
classifier when discriminating simulated hadrons from 
simulated electrons and positrons is shown in Figure 
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Figure 4: Classifier output from the TMine 
implementation of a TMVA boosted decision tree. 
Simulated hadrons (marked signal) are distinguished 
from simulated electrons and positrons (marked 
background). Events that are hadron- like are assigned 
positive predictor values, while events that are lepton-like 
are assigned negative values. The two event classes are 
well separated, and an independent sample of test events 
(filled histograms) agrees with the distribution of events 
used to train the classifier (data points). 



3.3. Classifying Unassociated LAT 
Sources 

While TMine was originally developed for use with 
the LAT event analysis, it is not limited to that pur- 
pose. Notably, TMine has been utilized to classify 
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Figure 5: Spatial distribution, in Galactic coordinates, 
for IFGL unassociated sources classified as AGN 
candidates (blue diamonds) and pulsar candidates (red 
circles). As expected, pulsar candidates are distributed 
primarily along the Galactic plane, while AGN 
candidates are distributed isotropically. The sources left 
unclassified are shown as green crosses. 

unassociated 7-ray sources 0. Of the 1451 7-ray 
sources in the First LAT Source Catalog (IFGL) |1Q| , 
630 are unassociated with counterparts in other wave- 
lengths. In an attempt to classify these sources, 
TMine was used to input individual source character- 
istics, such as spectral index, spectral curvature, and 
fractional variability into a forest of TMVA boosted 
decision trees. These input variables were selected to 
be independent of source flux, location, or significance, 
since these distributions differ between associated and 
unassociated sources. The TMVA decision trees were 
trained on the set of IFGL sources already associated 
with active galactic nuclei (AGN) and pulsars. The 
output of this analysis was a predictor representing 
the probability that a source is an AGN versus a pul- 
sar. 

Unassociated sources were separated into AGN can- 
didates and pulsar candidates by cutting on the out- 
put of the classifier. This cut was designed to have 
80% efficiency when applied to an independent set of 
sources associated to AGN and pulsars in the IFGL. 
The Galactic latitude of the unassociated sources was 
explicitly omitted from the classifier training, but the 
spatial distribution of candidate AGN was found to be 
isotropically distributed, while the pulsar candidates 
were distributed along the Galactic plane (Figure [5j). 
From follow-up observations on a subset of the unas- 
sociated sources, the cut placed on the multivariate 
classifier is found to be ~ 70% efficient with a con- 
tamination of ~ 5% for both AGN and pulsars |9|]. 



4. Conclusions 

We present TMine, a new tool for developing and 
processing complex classification tasks. TMine is a 



ROOT-based tool utilizing the multivariate classifica- 
tion package, TMVA. While the primary application 
of TMine is to the LAT event analyses (specifically 
the Pass-8 iteration), it has a wide range of possible 
applications. 
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